Image Captioning Through Image Transformer
نویسندگان
چکیده
Automatic captioning of images is a task that combines the challenges image analysis and text generation. One important aspect notion attention: how to decide what describe in which order. Inspired by successes translation, previous works have proposed transformer architecture for captioning. However, structure between semantic units (usually detected regions from object detection model) sentences (each single word) different. Limited work has been done adapt transformer’s internal images. In this work, we introduce transformer, consists modified encoding an implicit decoding motivated relative spatial relationship regions. Our design widens original layer’s inner With only feature as inputs, our model achieves new state-of-the-art performance on both MSCOCO offline online testing benchmarks. The code available at https://github.com/wtliao/ImageTransformer.
منابع مشابه
Phrase-based Image Captioning
Generating a novel textual description of an image is an interesting problem that connects computer vision and natural language processing. In this paper, we present a simple model that is able to generate descriptive sentences given a sample image. This model has a strong focus on the syntax of the descriptions. We train a purely bilinear model that learns a metric between an image representat...
متن کاملDomain-Specific Image Captioning
We present a data-driven framework for image caption generation which incorporates visual and textual features with varying degrees of spatial structure. We propose the task of domain-specific image captioning, where many relevant visual details cannot be captured by off-the-shelf general-domain entity detectors. We extract previously-written descriptions from a database and adapt them to new q...
متن کاملConvolutional Image Captioning
Image captioning is an important but challenging task, applicable to virtual assistants, editing tools, image indexing, and support of the disabled. Its challenges are due to the variability and ambiguity of possible image descriptions. In recent years significant progress has been made in image captioning, using Recurrent Neural Networks powered by long-short-term-memory (LSTM) units. Despite ...
متن کاملImage Captioning with Attention
In the past few years, neural networks have fueled dramatic advances in image classi cation. Emboldened, researchers are looking for more challenging applications for computer vision and arti cial intelligence systems. They seek not only to assign numerical labels to input data, but to describe the world in human terms. Image and video captioning is among the most popular applications in this t...
متن کاملImage Transformer
Image generation has been successfully cast as an autoregressive sequence generation or transformation problem. Recent work has shown that self-attention is an effective way of modeling textual sequences. In this work, we generalize a recently proposed model architecture based on self-attention, the Transformer, to a sequence modeling formulation of image generation with a tractable likelihood....
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Lecture Notes in Computer Science
سال: 2021
ISSN: ['1611-3349', '0302-9743']
DOI: https://doi.org/10.1007/978-3-030-69538-5_10